Goto

Collaborating Authors

 emotional intelligence


'We May Have a Crisis on Our Hands': The Unregulated Rise of Emotionally Intelligent AI

TIME - Tech

'We May Have a Crisis on Our Hands': The Unregulated Rise of Emotionally Intelligent AI Pillay is an editorial fellow at TIME. Pillay is an editorial fellow at TIME. At least once a month, two-thirds of people who regularly use AI turn to their bots for advice on sensitive personal issues and emotional support. Many people now report trusting their chatbots more than their elected representatives, civil servants, faith leaders--and the companies building AI. That's according to data from 70 countries, gathered by the Collective Intelligence Project (CIP).


Diplomat: A Dialogue Dataset for Situated PragMATic Reasoning

Neural Information Processing Systems

The ability to discern and comprehend pragmatic meanings is a cornerstone of social and emotional intelligence, referred to as pragmatic reasoning. Despite the strides made in the development of Large Language Models (LLMs), such as ChatGPT, these models grapple with capturing the nuanced and ambiguous facets of language, falling short of the aspiration to build human-like conversational agents.


EQ-Negotiator: Dynamic Emotional Personas Empower Small Language Models for Edge-Deployable Credit Negotiation

Long, Yunbo, Liu, Yuhan, Brintrup, Alexandra

arXiv.org Artificial Intelligence

The deployment of large language models (LLMs) in automated negotiation has set a high performance benchmark, but their computational cost and data privacy requirements render them unsuitable for many privacy-sensitive, on-device applications such as mobile assistants, embodied AI agents or private client interactions. While small language models (SLMs) offer a practical alternative, they suffer from a significant performance gap compared to LLMs in playing emotionally charged complex personas, especially for credit negotiation. This paper introduces EQ-Negotiator, a novel framework that bridges this capability gap using emotional personas. Its core is a reasoning system that integrates game theory with a Hidden Markov Model(HMM) to learn and track debtor emotional states online, without pre-training. This allows EQ-Negotiator to equip SLMs with the strategic intelligence to counter manipulation while de-escalating conflict and upholding ethical standards. Through extensive agent-to-agent simulations across diverse credit negotiation scenarios, including adversarial debtor strategies like cheating, threatening, and playing the victim, we show that a 7B parameter language model with EQ-Negotiator achieves better debt recovery and negotiation efficiency than baseline LLMs more than 10 times its size. This work advances persona modeling from descriptive character profiles to dynamic emotional architectures that operate within privacy constraints. Besides, this paper establishes that strategic emotional intelligence, not raw model scale, is the critical factor for success in automated negotiation, paving the way for effective, ethical, and privacy-preserving AI negotiators that can operate on the edge.


Intelligent Agents with Emotional Intelligence: Current Trends, Challenges, and Future Prospects

Zall, Raziyeh, Kheyrkhah, Alireza, Cambria, Erik, Naseri, Zahra, Kangavari, M. Reza

arXiv.org Artificial Intelligence

Developing intelligent agents that possess human-level intelligence is a key goal in the field of human-computer interaction (HCI) and general artificial intelligence[2]. A crucial aspect of achieving this goal is the incorporation of emotional intelligence, which is essential for human cognition and social interaction, into these intelligent agents. Emotional intelligence encompasses three interrelated capabilities: 1) emotion understanding, which involves accurately detecting and understanding affective signals, such as recognizing individuals' emotional states during interactions; 2) emotion elicitation and experiences, which refers to interpreting the causes, context, and implications of emotions for both the individual and the interaction; and 3) emotion expression, which encompasses the capacity to generate, modulate, and convey appropriate emotional responses in a socially meaningful manner. Affective Computing, coined by Rosalind Picard [1], emerged as a discipline dedicated to equipping machines with emotional intelligence, enabling them to recognize, interpret, and respond to human emotions. By embedding emotional intelligence into intelligent agents, affective computing facilitates more naturalistic, adaptive, and socially competent interactions, which in turn enhances user trust, engagement, and satisfaction [209]. Such emotionally intelligent systems not only improve usability but also enable advanced functionalities, including personalized assistance, empathetic dialogue, and context-aware decision-making. In Figure 1, an overview of the emotional intelligence capabilities in intelligent agents is presented. The process of emotional intelligence begins with analyzing the emotional aspects of the user input, enabling the agent to identify the user's affective state during interactions [259][306]. The next step is affective cognition, where the agent evaluates the observed emotional events using cognitive mental states to ensure accurate interpretation.


MULTI-Bench: A Multi-Turn Interactive Benchmark for Assessing Emotional Intelligence ability of Spoken Dialogue Models

Deng, Yayue, Hu, Guoqiang, Sun, Haiyang, Zhang, Xiangyu, Zhang, Haoyang, Tian, Fei, Yang, Xuerui, Yu, Gang, Chng, Eng Siong

arXiv.org Artificial Intelligence

Spoken Dialogue Models (SDMs) have advanced rapidly, yet their ability to sustain genuinely interactive multi-turn conversations remains underexplored, as most benchmarks focus on single-turn exchanges. We introduce Multi-Bench, the first benchmark explicitly designed to evaluate SDMs in multi-turn interactive dialogue with an emphasis on emotional intelligence. Multi-Bench employs a hierarchical structure with a basic track for emotion understanding and reasoning and an advanced track for emotion support and application. It comprises five carefully designed tasks and about 3.2K samples, ranging from emotion recognition to complex reasoning and interactive dialogue, supported by a reproducible evaluation framework. We evaluate six representative SDMs on eight subsets of Multi-Bench. Results show that while current SDMs achieve good performance on basic understanding tasks, they still have room for improvement in advanced multi-turn interactive dialogue and reasoning-related tasks, particularly in emotion awareness and application.


Seeing is Not Understanding: A Benchmark on Perception-Cognition Disparities in Large Language Models

Li, Haokun, Zhang, Yazhou, Ding, Jizhi, Li, Qiuchi, Zhang, Peng

arXiv.org Artificial Intelligence

With the rapid advancement of Multimodal Large Language Models (MLLMs), they have demonstrated exceptional capabilities across a variety of vision-language tasks. However, current evaluation benchmarks predominantly focus on objective visual question answering or captioning, inadequately assessing the models' ability to understand complex and subjective human emotions. To bridge this gap, we introduce EmoBench-Reddit, a novel, hierarchical benchmark for multimodal emotion understanding. The dataset comprises 350 meticulously curated samples from the social media platform Reddit, each containing an image, associated user-provided text, and an emotion category (sad, humor, sarcasm, happy) confirmed by user flairs. We designed a hierarchical task framework that progresses from basic perception to advanced cognition, with each data point featuring six multiple-choice questions and one open-ended question of increasing difficulty. Perception tasks evaluate the model's ability to identify basic visual elements (e.g., colors, objects), while cognition tasks require scene reasoning, intent understanding, and deep empathy integrating textual context. We ensured annotation quality through a combination of AI assistance (Claude 4) and manual verification.We conducted a comprehensive evaluation of nine leading MLLMs, including GPT-5, Gemini-2.5-pro, and GPT-4o, on EmoBench-Reddit.


PersonaFuse: A Personality Activation-Driven Framework for Enhancing Human-LLM Interactions

Tang, Yixuan, Yang, Yi, Abbasi, Ahmed

arXiv.org Artificial Intelligence

Recent advancements in Large Language Models (LLMs) demonstrate remarkable capabilities across various fields. These developments have led to more direct communication between humans and LLMs in various situations, such as social companionship and psychological support. However, LLMs often exhibit limitations in emotional perception and social competence during real-world conversations. These limitations partly originate from their inability to adapt their communication style and emotional expression to different social and task contexts. In this work, we introduce PersonaFuse, a novel LLM post-training framework that enables LLMs to adapt and express different personalities for varying situations. Inspired by Trait Activation Theory and the Big Five personality model, PersonaFuse employs a Mixture-of-Expert architecture that combines persona adapters with a dynamic routing network, enabling contextual trait expression. Experimental results show that PersonaFuse substantially outperforms baseline models across multiple dimensions of social-emotional intelligence. Importantly, these gains are achieved without sacrificing general reasoning ability or model safety, which remain common limitations of direct prompting and supervised fine-tuning approaches. PersonaFuse also delivers consistent improvements in downstream human-centered applications, such as mental health counseling and review-based customer service. Finally, human preference evaluations against leading LLMs, including GPT-4o and DeepSeek, demonstrate that PersonaFuse achieves competitive response quality despite its comparatively smaller model size. These findings demonstrate that PersonaFuse offers a theoretically grounded and practical approach for developing social-emotional enhanced LLMs, marking a significant advancement toward more human-centric AI systems.


Integrating emotional intelligence, memory architecture, and gestures to achieve empathetic humanoid robot interaction in an educational setting

Sun, Fuze, Li, Lingyu, Meng, Shixiangyue, Teng, Xiaoming, Payne, Terry R., Craig, Paul

arXiv.org Artificial Intelligence

This study investigates the integration of individual human traits into an empathetically adaptive educational robot tutor system designed to improve student engagement and learning outcomes with corresponding Engagement Vector measurement. While prior research in the field of Human-Robot Interaction (HRI) has examined the integration of the traits, such as emotional intelligence, memory-driven personalization, and non-verbal communication, by themselves, they have thus-far neglected to consider their synchronized integration into a cohesive, operational education framework. To address this gap, we customize a Multi-Modal Large Language Model (LLaMa 3.2 from Meta) deployed with modules for human-like traits (emotion, memory and gestures) into an AI-Agent framework. This constitutes to the robot's intelligent core mimicing the human emotional system, memory architecture and gesture control to allow the robot to behave more empathetically while recognizing and responding appropriately to the student's emotional state. It can also recall the student's past learning record and adapt its style of interaction accordingly. This allows the robot tutor to react to the student in a more sympathetic manner by delivering personalized verbal feedback synchronized with relevant gestures. Our study investigates the extent of this effect through the introduction of Engagement Vector Model which can be a surveyor's pole for judging the quality of HRI experience. Quantitative and qualitative results demonstrate that such an empathetic responsive approach significantly improves student engagement and learning outcomes compared with a baseline humanoid robot without these human-like traits. This indicates that robot tutors with empathetic capabilities can create a more supportive, interactive learning experience that ultimately leads to better outcomes for the student.


MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models

Zhang, Fan, Cheng, Zebang, Deng, Chong, Li, Haoxuan, Lian, Zheng, Chen, Qian, Liu, Huadai, Wang, Wen, Zhang, Yi-Fan, Zhang, Renrui, Guo, Ziyu, Zhu, Zhihong, Wu, Hao, Wang, Haixin, Zheng, Yefeng, Peng, Xiaojiang, Wu, Xian, Wang, Kun, Li, Xiangang, Ye, Jieping, Heng, Pheng-Ann

arXiv.org Artificial Intelligence

Recent advances in multimodal large language models (MLLMs) have catalyzed transformative progress in affective computing, enabling models to exhibit emergent emotional intelligence. Despite substantial methodological progress, current emotional benchmarks remain limited, as it is still unknown: (a) the generalization abilities of MLLMs across distinct scenarios, and (b) their reasoning capabilities to identify the triggering factors behind emotional states. To bridge these gaps, we present \textbf{MME-Emotion}, a systematic benchmark that assesses both emotional understanding and reasoning capabilities of MLLMs, enjoying \textit{scalable capacity}, \textit{diverse settings}, and \textit{unified protocols}. As the largest emotional intelligence benchmark for MLLMs, MME-Emotion contains over 6,000 curated video clips with task-specific questioning-answering (QA) pairs, spanning broad scenarios to formulate eight emotional tasks. It further incorporates a holistic evaluation suite with hybrid metrics for emotion recognition and reasoning, analyzed through a multi-agent system framework. Through a rigorous evaluation of 20 advanced MLLMs, we uncover both their strengths and limitations, yielding several key insights: \ding{182} Current MLLMs exhibit unsatisfactory emotional intelligence, with the best-performing model achieving only $39.3\%$ recognition score and $56.0\%$ Chain-of-Thought (CoT) score on our benchmark. \ding{183} Generalist models (\emph{e.g.}, Gemini-2.5-Pro) derive emotional intelligence from generalized multimodal understanding capabilities, while specialist models (\emph{e.g.}, R1-Omni) can achieve comparable performance through domain-specific post-training adaptation. By introducing MME-Emotion, we hope that it can serve as a foundation for advancing MLLMs' emotional intelligence in the future.


GPT-5 Doesn't Dislike You--It Might Just Need a Benchmark for Emotional Intelligence

WIRED

Since the all-new ChatGPT launched on Thursday, some users have mourned the disappearance of a peppy and encouraging personality in favor of a colder, more businesslike one (a move seemingly designed to reduce unhealthy user behavior.) The backlash shows the challenge of building artificial intelligence systems that exhibit anything like real emotional intelligence. Researchers at MIT have proposed a new kind of AI benchmark to measure how AI systems can manipulate and influence their users--in both positive and negative ways--in a move that could perhaps help AI builders avoid similar backlashes in the future while also keeping vulnerable users safe. Most benchmarks try to gauge intelligence by testing a model's ability to answer exam questions, solve logical puzzles, or come up with novel answers to knotty math problems. As the psychological impact of AI use becomes more apparent, we may see MIT propose more benchmarks aimed at measuring more subtle aspects of intelligence as well as machine-to-human interactions.